feat: add Vision Transformer (ViT) implementation for image classification#13334
Open
devvratpathak wants to merge 7 commits intoTheAlgorithms:masterfrom
Open
feat: add Vision Transformer (ViT) implementation for image classification#13334devvratpathak wants to merge 7 commits intoTheAlgorithms:masterfrom
devvratpathak wants to merge 7 commits intoTheAlgorithms:masterfrom
Conversation
…features section - Add comprehensive table of contents for easy navigation - Include detailed installation steps with virtual environment setup - Add usage examples showing how to run and import algorithms - Create features section listing all algorithm categories - Add explicit license section with MIT License information - Expand contributing section with quick start guide - Add about section explaining repository purpose Fixes TheAlgorithms#13111
…features section - Add comprehensive table of contents for easy navigation - Include detailed installation steps with virtual environment setup - Add usage examples showing how to run and import algorithms - Create features section listing all algorithm categories - Add explicit license section with MIT License information - Expand contributing section with quick start guide - Add about section explaining repository purpose Fixes TheAlgorithms#13111
…ation - Implement complete ViT architecture with patch embedding - Add positional encoding with learnable CLS token - Include scaled dot-product attention mechanism - Implement transformer encoder blocks with layer normalization - Add feed-forward network with GELU activation - Include comprehensive docstrings and type hints - Add doctests for all functions - Provide example usage demonstrating the complete pipeline Fixes TheAlgorithms#13326
- Replace Optional with X | None syntax (UP045) - Use np.random.Generator instead of legacy np.random methods (NPY002) - Fix line length violations (E501) - Assign f-string literals to variables in exceptions (EM102) - Remove unused variables and parameters (RUF059, F841) - Add noqa comment for intentionally unused API parameter - All doctests still pass successfully
There was a problem hiding this comment.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
| return output, attention_weights | ||
|
|
||
|
|
||
| def layer_norm(x: np.ndarray, epsilon: float = 1e-6) -> np.ndarray: |
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
| return (x - mean) / (std + epsilon) | ||
|
|
||
|
|
||
| def feedforward_network(x: np.ndarray, hidden_dim: int = 3072) -> np.ndarray: |
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
|
|
||
|
|
||
| def transformer_encoder_block( | ||
| x: np.ndarray, num_heads: int = 12, hidden_dim: int = 3072 # noqa: ARG001 |
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
for more information, see https://pre-commit.ci
- Rename 'x' to 'embeddings' in layer_norm, feedforward_network, and transformer_encoder_block functions - Update all docstring examples to use 'embeddings' - Improves code readability per algorithms-keeper bot feedback - Fix noqa comment placement for unused num_heads parameter - All doctests and ruff checks pass
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your change:
This PR adds a comprehensive Vision Transformer (ViT) implementation to the
computer_visionfolder for image classification tasks, implementing the architecture from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et al., 2020).The implementation includes patch embedding, positional encoding, attention mechanism, layer normalization, feed-forward network, transformer encoder blocks, and the complete ViT pipeline. All functions have comprehensive docstrings, type hints, doctests, and pass all ruff checks.
Fixes #13326
Checklist: